Sample size effects on reliability

Differences in HDDM parameter reliability for t1 data using either n=552 or n=150 were in a separate report on T1 HDDM parameters. No meaningful differences were found between these two sample sizes.

But even 150 is a large sample size for psychological studies, especially forced choice reaction time tasks that are included in this report. Here we look at how the reliability for raw and ddm measures change for sample sizes that are more common in studies using these tasks (25, 50, 75, 100, 125, 150)

Note: Not refitting HDDM’s for each of these sample sizes since a. there were no differences in parameter stability for n=150 vs 552 and b. a more comprehensive comparison using non-hierarchical estimates and model fit indices will follow. [Should I revisit this? - 150 and 552 might be too large to lead to changes in parameter estimates but smaller samples that are more common in psych studies might sway estimates more. If this were the case then wouldn’t we expect the comparison of non-hierarchical vs hierarchical estimates to be the largest? If there is no difference then we don’t have to worry about it?]

Note: Some variables do not have enough variance to calculate reliability for difference sample sizes. These variables are:
>stroop.post_error_slowing
>simon.std_rt_error
>shape_matching.post_error_slowing
>directed_forgetting.post_error_slowing
>choice_reaction_time.post_error_slowing
>choice_reaction_time.std_rt_error
>dot_pattern_expectancy.post_error_slowing
>motor_selective_stop_signal.go_rt_std_error
>motor_selective_stop_signal.go_rt_error
>attention_network_task.post_error_slowing
>recent_probes.post_error_slowing
>simon.post_error_slowing
>dot_pattern_expectancy.BY_errors

source('/Users/zeynepenkavi/Dropbox/PoldrackLab/SRO_DDM_Analyses/code/workspace_scripts/ddm_reldf_sample_size.R')
Warning: Column `dv` joining factor and character vector, coercing into
character vector

Does the mean reliability change with sample size?

Yes. The larger the sample size the more reliable is a given measure on average. The largest increase in reliability is when shifting from 25 to 50 subjects. This is important because many studies using these measures have sample sizes <50 per group.

fig_name = 'rel_by_samplesize.jpeg'

knitr::include_graphics(paste0(fig_path, fig_name))

When <15 subjects are used to calculate the measures they are significantly less reliable.

summary(lmer(icc ~ factor(sample_size) + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: icc ~ factor(sample_size) + (1 | dv) + (1 | iteration)
   Data: rel_df_sample_size

REML criterion at convergence: 2823478

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-605.6    0.0    0.0    0.0    0.4 

Random effects:
 Groups    Name        Variance Std.Dev.
 dv        (Intercept)  0.06528 0.2555  
 iteration (Intercept)  0.00427 0.0653  
 Residual              65.64367 8.1021  
Number of obs: 402034, groups:  dv, 505; iteration, 100

Fixed effects:
                       Estimate Std. Error t value
(Intercept)              0.1251     0.0386    3.24
factor(sample_size)15    0.2470     0.0513    4.81
factor(sample_size)20    0.2780     0.0513    5.42
factor(sample_size)25    0.2947     0.0512    5.76
factor(sample_size)50    0.3222     0.0512    6.30
factor(sample_size)75    0.3303     0.0512    6.45
factor(sample_size)100   0.3339     0.0512    6.53
factor(sample_size)125   0.3353     0.0512    6.55

Correlation of Fixed Effects:
            (Intr) f(_)15 f(_)20 f(_)25 f(_)50 f(_)75 f(_)10
fctr(sm_)15 -0.665                                          
fctr(sm_)20 -0.665  0.500                                   
fctr(sm_)25 -0.667  0.502  0.502                            
fctr(sm_)50 -0.667  0.502  0.502  0.504                     
fctr(sm_)75 -0.667  0.502  0.502  0.504  0.504              
fctr(s_)100 -0.667  0.502  0.502  0.504  0.504  0.504       
fctr(s_)125 -0.667  0.502  0.502  0.504  0.504  0.504  0.504

Are there differences between any other sample sizes? This ignores the differences between variables but there seems to be only differences between n=10 and all other larger sample size.

with(rel_df_sample_size_summary, pairwise.t.test(mean_icc, sample_size, p.adjust.method = "bonferroni"))

    Pairwise comparisons using t tests with pooled SD 

data:  mean_icc and sample_size 

    10    15 20 25 50 75 100
15  2e-04 -  -  -  -  -  -  
20  9e-06 1  -  -  -  -  -  
25  2e-06 1  1  -  -  -  -  
50  9e-08 1  1  1  -  -  -  
75  4e-08 1  1  1  1  -  -  
100 3e-08 1  1  1  1  1  -  
125 2e-08 1  1  1  1  1  1  

P value adjustment method: bonferroni 

Does the change in reliabiliity with sample size vary by variable type?

No. The changes do not differ by raw vs. ddm measures or for contrast and condition measures compared to non-contrast measures. Contrast and condition measures are just less reliable overall.

summary(lmer(icc ~ sample_size * ddm_raw + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: icc ~ sample_size * ddm_raw + (1 | dv) + (1 | iteration)
   Data: rel_df_sample_size

REML criterion at convergence: 2805433

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-603.4    0.0    0.0    0.0    0.4 

Random effects:
 Groups    Name        Variance Std.Dev.
 dv        (Intercept)  0.06500 0.2550  
 iteration (Intercept)  0.00433 0.0658  
 Residual              66.14286 8.1328  
Number of obs: 399034, groups:  dv, 499; iteration, 100

Fixed effects:
                        Estimate Std. Error t value
(Intercept)             0.349578   0.030865   11.33
sample_size             0.001260   0.000400    3.15
ddm_rawraw             -0.105882   0.049825   -2.13
sample_size:ddm_rawraw  0.000831   0.000662    1.26

Correlation of Fixed Effects:
            (Intr) smpl_s ddm_rw
sample_size -0.681              
ddm_rawraw  -0.591  0.422       
smpl_sz:dd_  0.412 -0.605 -0.697
summary(lmer(icc ~ sample_size * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: 
icc ~ sample_size * overall_difference + (1 | dv) + (1 | iteration)
   Data: rel_df_sample_size

REML criterion at convergence: 2805380

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-603.4    0.0    0.0    0.0    0.3 

Random effects:
 Groups    Name        Variance Std.Dev.
 dv        (Intercept)  0.04637 0.2153  
 iteration (Intercept)  0.00433 0.0658  
 Residual              66.14275 8.1328  
Number of obs: 399034, groups:  dv, 499; iteration, 100

Fixed effects:
                                         Estimate Std. Error t value
(Intercept)                              0.518096   0.044356   11.68
sample_size                              0.000621   0.000602    1.03
overall_differencecontrast              -0.459839   0.065919   -6.98
overall_differencecondition             -0.211001   0.054844   -3.85
sample_size:overall_differencecontrast   0.001230   0.000905    1.36
sample_size:overall_differencecondition  0.001343   0.000753    1.78

Correlation of Fixed Effects:
                       (Intr) smpl_s ovrll_dffrnccnt ovrll_dffrnccnd
sample_size            -0.713                                       
ovrll_dffrnccnt        -0.658  0.480                                
ovrll_dffrnccnd        -0.791  0.577  0.532                         
smpl_sz:vrll_dffrnccnt  0.475 -0.665 -0.721          -0.384         
smpl_sz:vrll_dffrnccnd  0.571 -0.800 -0.384          -0.721         
                       smpl_sz:vrll_dffrnccnt
sample_size                                  
ovrll_dffrnccnt                              
ovrll_dffrnccnd                              
smpl_sz:vrll_dffrnccnt                       
smpl_sz:vrll_dffrnccnd  0.532                

Does variability of reliability change with sample size?

Trending but not significant. The SEMs are always pretty small.

rel_df_sample_size_summary %>%
  na.exclude() %>%
  ggplot(aes(factor(sample_size), sem_icc))+
  geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
  facet_wrap(~overall_difference)+
  ylab("Standard error of mean of reliability \n of 100 samples of size n")+
  xlab("Sample size")+
  theme(legend.title = element_blank(),
        legend.position = "bottom")+
  ylim(0,0.3)
Warning: Removed 16 rows containing missing values (geom_path).

summary(lmer(sem_icc ~ sample_size * overall_difference + (1|dv), rel_df_sample_size_summary))
Linear mixed model fit by REML ['lmerMod']
Formula: sem_icc ~ sample_size * overall_difference + (1 | dv)
   Data: rel_df_sample_size_summary

REML criterion at convergence: 9708

Scaled residuals: 
   Min     1Q Median     3Q    Max 
 -0.12  -0.06  -0.02   0.01  60.38 

Random effects:
 Groups   Name        Variance Std.Dev.
 dv       (Intercept) 0.000213 0.0146  
 Residual             0.657690 0.8110  
Number of obs: 3992, groups:  dv, 499

Fixed effects:
                                         Estimate Std. Error t value
(Intercept)                              0.038630   0.039761    0.97
sample_size                             -0.000334   0.000600   -0.56
overall_differencecontrast               0.048118   0.059791    0.80
overall_differencecondition              0.090710   0.049734    1.82
sample_size:overall_differencecontrast  -0.000466   0.000902   -0.52
sample_size:overall_differencecondition -0.000986   0.000750   -1.31

Correlation of Fixed Effects:
                       (Intr) smpl_s ovrll_dffrnccnt ovrll_dffrnccnd
sample_size            -0.792                                       
ovrll_dffrnccnt        -0.665  0.527                                
ovrll_dffrnccnd        -0.799  0.633  0.532                         
smpl_sz:vrll_dffrnccnt  0.527 -0.665 -0.792          -0.421         
smpl_sz:vrll_dffrnccnd  0.633 -0.799 -0.421          -0.792         
                       smpl_sz:vrll_dffrnccnt
sample_size                                  
ovrll_dffrnccnt                              
ovrll_dffrnccnd                              
smpl_sz:vrll_dffrnccnt                       
smpl_sz:vrll_dffrnccnd  0.532                

Does between subjects variance change with sample size?

Yes. Between subjects variance decreases with sample size. This is more pronounced for non-contrast measures.

This goes against my intuitions. Looking at the change in between subjects percentage of individual measures’ there seems to be a lot of inter-measure variance (more pronounced below for within subject variance). I’m not sure if there is something in common for the measures that show increasing between subjects variability with sample size and that separates them from those that show decreasing between subjects variability with sample size (the slight majority).

tmp = rel_df_sample_size_summary %>%
  na.exclude()%>%
  group_by(overall_difference, sample_size, ddm_raw) %>%
  summarise(mean_var_subs_pct = mean(mean_var_subs_pct, na.rm=T))

rel_df_sample_size_summary %>%
  na.exclude() %>%
  ggplot(aes(factor(sample_size), mean_var_subs_pct))+
  geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
  geom_line(data = tmp, aes(factor(sample_size),mean_var_subs_pct, color=ddm_raw, group=ddm_raw))+
  geom_point(data = tmp, aes(factor(sample_size),mean_var_subs_pct, color=ddm_raw))+
  facet_wrap(~overall_difference)+
  ylab("Mean percentage of \n between subjects variance \n of 100 samples of size n")+
  xlab("Sample size")+
  theme(legend.title = element_blank(),
        legend.position = "bottom")

summary(lmer(var_subs_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_subs_pct ~ factor(sample_size) * overall_difference + (1 |  
    dv) + (1 | iteration)
   Data: rel_df_sample_size

REML criterion at convergence: 3274121

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-4.931 -0.705  0.054  0.704  4.571 

Random effects:
 Groups    Name        Variance Std.Dev.
 dv        (Intercept) 109.05   10.44   
 iteration (Intercept)   1.26    1.12   
 Residual              212.49   14.58   
Number of obs: 399034, groups:  dv, 499; iteration, 100

Fixed effects:
                                                   Estimate Std. Error
(Intercept)                                          57.911      0.898
factor(sample_size)15                                -0.139      0.175
factor(sample_size)20                                -0.467      0.175
factor(sample_size)25                                -0.964      0.175
factor(sample_size)50                                -3.428      0.174
factor(sample_size)75                                -5.652      0.174
factor(sample_size)100                               -7.458      0.174
factor(sample_size)125                               -9.173      0.174
overall_differencecontrast                          -14.208      1.340
overall_differencecondition                          -5.099      1.115
factor(sample_size)15:overall_differencecontrast      0.298      0.262
factor(sample_size)20:overall_differencecontrast      0.611      0.262
factor(sample_size)25:overall_differencecontrast      1.143      0.262
factor(sample_size)50:overall_differencecontrast      3.440      0.262
factor(sample_size)75:overall_differencecontrast      5.385      0.262
factor(sample_size)100:overall_differencecontrast     6.711      0.262
factor(sample_size)125:overall_differencecontrast     8.236      0.262
factor(sample_size)15:overall_differencecondition     0.219      0.218
factor(sample_size)20:overall_differencecondition     0.607      0.218
factor(sample_size)25:overall_differencecondition     0.897      0.218
factor(sample_size)50:overall_differencecondition     1.993      0.218
factor(sample_size)75:overall_differencecondition     2.978      0.218
factor(sample_size)100:overall_differencecondition    3.532      0.218
factor(sample_size)125:overall_differencecondition    4.155      0.218
                                                   t value
(Intercept)                                          64.47
factor(sample_size)15                                -0.80
factor(sample_size)20                                -2.68
factor(sample_size)25                                -5.53
factor(sample_size)50                               -19.64
factor(sample_size)75                               -32.39
factor(sample_size)100                              -42.74
factor(sample_size)125                              -52.57
overall_differencecontrast                          -10.60
overall_differencecondition                          -4.57
factor(sample_size)15:overall_differencecontrast      1.14
factor(sample_size)20:overall_differencecontrast      2.33
factor(sample_size)25:overall_differencecontrast      4.36
factor(sample_size)50:overall_differencecontrast     13.12
factor(sample_size)75:overall_differencecontrast     20.54
factor(sample_size)100:overall_differencecontrast    25.60
factor(sample_size)125:overall_differencecontrast    31.41
factor(sample_size)15:overall_differencecondition     1.00
factor(sample_size)20:overall_differencecondition     2.78
factor(sample_size)25:overall_differencecondition     4.11
factor(sample_size)50:overall_differencecondition     9.14
factor(sample_size)75:overall_differencecondition    13.65
factor(sample_size)100:overall_differencecondition   16.19
factor(sample_size)125:overall_differencecondition   19.05

Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

Does within subjects variance change with sample size?

Yes. Within subject variance increses with sample size. This again goes against my intuition but here the inter-meausre differences are even more pronounced. There appears to be some measures for which the change in two measurements at different time points is larger the more subjects are tested and those that show a smaller decrease in within subject variance with larger sample sizes. I still don’t know if these two types of measures have anything that distinguishes them.

tmp = rel_df_sample_size_summary %>%
  na.exclude()%>%
  group_by(overall_difference, sample_size, ddm_raw) %>%
  summarise(mean_var_ind_pct = mean(mean_var_ind_pct, na.rm=T))

rel_df_sample_size_summary %>%
  na.exclude() %>%
  ggplot(aes(factor(sample_size), mean_var_ind_pct))+
  geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
  geom_line(data = tmp, aes(factor(sample_size),mean_var_ind_pct, color=ddm_raw, group=ddm_raw))+
  geom_point(data = tmp, aes(factor(sample_size),mean_var_ind_pct, color=ddm_raw))+
  facet_wrap(~overall_difference)+
  ylab("Mean percentage of \n within subjects variance \n of 100 samples of size n")+
  xlab("Sample size")+
  theme(legend.title = element_blank(),
        legend.position = "bottom")

summary(lmer(var_ind_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_ind_pct ~ factor(sample_size) * overall_difference + (1 |  
    dv) + (1 | iteration)
   Data: rel_df_sample_size

REML criterion at convergence: 3471555

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-4.551 -0.750 -0.171  0.698  4.326 

Random effects:
 Groups    Name        Variance Std.Dev.
 dv        (Intercept) 132.28   11.50   
 iteration (Intercept)   1.41    1.19   
 Residual              348.68   18.67   
Number of obs: 399034, groups:  dv, 499; iteration, 100

Fixed effects:
                                                   Estimate Std. Error
(Intercept)                                          19.234      0.992
factor(sample_size)15                                 0.504      0.224
factor(sample_size)20                                 1.149      0.224
factor(sample_size)25                                 1.844      0.224
factor(sample_size)50                                 5.282      0.224
factor(sample_size)75                                 8.154      0.224
factor(sample_size)100                               10.486      0.224
factor(sample_size)125                               12.721      0.224
overall_differencecontrast                            4.621      1.481
overall_differencecondition                           1.928      1.232
factor(sample_size)15:overall_differencecontrast     -0.606      0.336
factor(sample_size)20:overall_differencecontrast     -1.230      0.336
factor(sample_size)25:overall_differencecontrast     -1.885      0.336
factor(sample_size)50:overall_differencecontrast     -4.844      0.336
factor(sample_size)75:overall_differencecontrast     -7.225      0.336
factor(sample_size)100:overall_differencecontrast    -8.779      0.336
factor(sample_size)125:overall_differencecontrast   -10.817      0.336
factor(sample_size)15:overall_differencecondition    -0.297      0.280
factor(sample_size)20:overall_differencecondition    -0.634      0.280
factor(sample_size)25:overall_differencecondition    -0.928      0.279
factor(sample_size)50:overall_differencecondition    -2.201      0.279
factor(sample_size)75:overall_differencecondition    -3.192      0.279
factor(sample_size)100:overall_differencecondition   -3.630      0.279
factor(sample_size)125:overall_differencecondition   -4.312      0.279
                                                   t value
(Intercept)                                          19.39
factor(sample_size)15                                 2.25
factor(sample_size)20                                 5.14
factor(sample_size)25                                 8.25
factor(sample_size)50                                23.63
factor(sample_size)75                                36.48
factor(sample_size)100                               46.91
factor(sample_size)125                               56.91
overall_differencecontrast                            3.12
overall_differencecondition                           1.57
factor(sample_size)15:overall_differencecontrast     -1.80
factor(sample_size)20:overall_differencecontrast     -3.66
factor(sample_size)25:overall_differencecontrast     -5.61
factor(sample_size)50:overall_differencecontrast    -14.42
factor(sample_size)75:overall_differencecontrast    -21.51
factor(sample_size)100:overall_differencecontrast   -26.14
factor(sample_size)125:overall_differencecontrast   -32.21
factor(sample_size)15:overall_differencecondition    -1.06
factor(sample_size)20:overall_differencecondition    -2.27
factor(sample_size)25:overall_differencecondition    -3.32
factor(sample_size)50:overall_differencecondition    -7.88
factor(sample_size)75:overall_differencecondition   -11.42
factor(sample_size)100:overall_differencecondition  -12.99
factor(sample_size)125:overall_differencecondition  -15.43

Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

Does residual variance change with sample size?

tmp = rel_df_sample_size_summary %>%
  na.exclude()%>%
  group_by(overall_difference, sample_size, ddm_raw) %>%
  summarise(mean_var_resid_pct = mean(mean_var_resid_pct, na.rm=T))

rel_df_sample_size_summary %>%
  na.exclude() %>%
  ggplot(aes(factor(sample_size), mean_var_resid_pct))+
  geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
  geom_line(data = tmp, aes(factor(sample_size),mean_var_resid_pct, color=ddm_raw, group=ddm_raw))+
  geom_point(data = tmp, aes(factor(sample_size),mean_var_resid_pct, color=ddm_raw))+
  facet_wrap(~overall_difference)+
  ylab("Mean percentage of residual variance \n of 100 samples of size n")+
  xlab("Sample size")+
  theme(legend.title = element_blank(),
        legend.position = "bottom")

summary(lmer(var_resid_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_resid_pct ~ factor(sample_size) * overall_difference + (1 |  
    dv) + (1 | iteration)
   Data: rel_df_sample_size

REML criterion at convergence: 2942477

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-4.591 -0.624 -0.037  0.560  8.251 

Random effects:
 Groups    Name        Variance Std.Dev.
 dv        (Intercept) 39.93    6.319   
 iteration (Intercept)  0.44    0.663   
 Residual              92.57    9.621   
Number of obs: 399034, groups:  dv, 499; iteration, 100

Fixed effects:
                                                   Estimate Std. Error
(Intercept)                                         22.8548     0.5443
factor(sample_size)15                               -0.3645     0.1153
factor(sample_size)20                               -0.6816     0.1152
factor(sample_size)25                               -0.8793     0.1152
factor(sample_size)50                               -1.8549     0.1152
factor(sample_size)75                               -2.5014     0.1152
factor(sample_size)100                              -3.0273     0.1152
factor(sample_size)125                              -3.5482     0.1152
overall_differencecontrast                           9.5866     0.8124
overall_differencecondition                          3.1707     0.6758
factor(sample_size)15:overall_differencecontrast     0.3080     0.1731
factor(sample_size)20:overall_differencecontrast     0.6197     0.1731
factor(sample_size)25:overall_differencecontrast     0.7418     0.1731
factor(sample_size)50:overall_differencecontrast     1.4046     0.1730
factor(sample_size)75:overall_differencecontrast     1.8407     0.1730
factor(sample_size)100:overall_differencecontrast    2.0673     0.1730
factor(sample_size)125:overall_differencecontrast    2.5810     0.1730
factor(sample_size)15:overall_differencecondition    0.0784     0.1440
factor(sample_size)20:overall_differencecondition    0.0268     0.1440
factor(sample_size)25:overall_differencecondition    0.0313     0.1440
factor(sample_size)50:overall_differencecondition    0.2082     0.1440
factor(sample_size)75:overall_differencecondition    0.2141     0.1440
factor(sample_size)100:overall_differencecondition   0.0988     0.1440
factor(sample_size)125:overall_differencecondition   0.1573     0.1440
                                                   t value
(Intercept)                                          41.99
factor(sample_size)15                                -3.16
factor(sample_size)20                                -5.92
factor(sample_size)25                                -7.63
factor(sample_size)50                               -16.10
factor(sample_size)75                               -21.72
factor(sample_size)100                              -26.28
factor(sample_size)125                              -30.81
overall_differencecontrast                           11.80
overall_differencecondition                           4.69
factor(sample_size)15:overall_differencecontrast      1.78
factor(sample_size)20:overall_differencecontrast      3.58
factor(sample_size)25:overall_differencecontrast      4.29
factor(sample_size)50:overall_differencecontrast      8.12
factor(sample_size)75:overall_differencecontrast     10.64
factor(sample_size)100:overall_differencecontrast    11.95
factor(sample_size)125:overall_differencecontrast    14.92
factor(sample_size)15:overall_differencecondition     0.54
factor(sample_size)20:overall_differencecondition     0.19
factor(sample_size)25:overall_differencecondition     0.22
factor(sample_size)50:overall_differencecondition     1.45
factor(sample_size)75:overall_differencecondition     1.49
factor(sample_size)100:overall_differencecondition    0.69
factor(sample_size)125:overall_differencecondition    1.09

Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

Conclusion: Larger samples are better for reliability but not necessarily always for the same reasons; for some variables this is due to increasing between subjects variance while for others it’s due to decreasing residual variance (?).

rm(rel_df_sample_size, rel_df_sample_size_summary)
---
title: 'Sample Size Effects on Reliability'
output:
github_document:
toc: yes
toc_float: yes
---

```{r echo=FALSE, message=FALSE, warning=FALSE, include=FALSE}
from_gh=FALSE
fig_path = '/Users/zeynepenkavi/Dropbox/PoldrackLab/SRO_DDM_Analyses/output/figures/'
library(tidyverse)
library(lme4)
```

## Sample size effects on reliability

Differences in HDDM parameter reliability for t1 data using either n=552 or n=150 were in a separate report on [T1 HDDM parameters](https://zenkavi.github.io/SRO_DDM_Analyses/output/reports/HDDM150vs522.nb.html). No meaningful differences were found between these two sample sizes.

But even 150 is a large sample size for psychological studies, especially forced choice reaction time tasks that are included in this report. Here we look at how the reliability for raw and ddm measures change for sample sizes that are more common in studies using these tasks (25, 50, 75, 100, 125, 150)

*Note:* Not refitting HDDM's for each of these sample sizes since a. there were no differences in parameter stability for n=150 vs 552 and b. a more comprehensive comparison using non-hierarchical estimates and model fit indices will follow. *[Should I revisit this? - 150 and 552 might be too large to lead to changes in parameter estimates but smaller samples that are more common in psych studies might sway estimates more. If this were the case then wouldn't we expect the comparison of non-hierarchical vs hierarchical estimates to be the largest? If there is no difference then we don't have to worry about it?]*

*Note:* Some variables do not have enough variance to calculate reliability for difference sample sizes. These variables are:  
>stroop.post_error_slowing  
>simon.std_rt_error  
>shape_matching.post_error_slowing  
>directed_forgetting.post_error_slowing  
>choice_reaction_time.post_error_slowing  
>choice_reaction_time.std_rt_error  
>dot_pattern_expectancy.post_error_slowing  
>motor_selective_stop_signal.go_rt_std_error  
>motor_selective_stop_signal.go_rt_error  
>attention_network_task.post_error_slowing  
>recent_probes.post_error_slowing  
>simon.post_error_slowing  
>dot_pattern_expectancy.BY_errors  

```{r}
source('/Users/zeynepenkavi/Dropbox/PoldrackLab/SRO_DDM_Analyses/code/workspace_scripts/ddm_reldf_sample_size.R')
```

Does the mean reliability change with sample size?

Yes. The larger the sample size the more reliable is a given measure on average. The largest increase in reliability is when shifting from 25 to 50 subjects. This is important because many studies using these measures have sample sizes <50 per group.

```{r}
fig_name = 'rel_by_samplesize.jpeg'

knitr::include_graphics(paste0(fig_path, fig_name))
```

When <15 subjects are used to calculate the measures they are significantly less reliable.

```{r}
summary(lmer(icc ~ factor(sample_size) + (1|dv) + (1|iteration), rel_df_sample_size))
```

Are there differences between any other sample sizes? This ignores the differences between variables but there seems to be only differences between n=10 and all other larger sample size.

```{r}
with(rel_df_sample_size_summary, pairwise.t.test(mean_icc, sample_size, p.adjust.method = "bonferroni"))
```

Does the change in reliabiliity with sample size vary by variable type?

No. The changes do not differ by raw vs. ddm measures or for contrast and condition measures compared to non-contrast measures. Contrast and condition measures are just less reliable overall.

```{r}
summary(lmer(icc ~ sample_size * ddm_raw + (1|dv) + (1|iteration), rel_df_sample_size))
```

```{r}
summary(lmer(icc ~ sample_size * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
```

Does variability of reliability change with sample size?

Trending but not significant. The SEMs are always pretty small.

```{r}
rel_df_sample_size_summary %>%
  na.exclude() %>%
  ggplot(aes(factor(sample_size), sem_icc))+
  geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
  facet_wrap(~overall_difference)+
  ylab("Standard error of mean of reliability \n of 100 samples of size n")+
  xlab("Sample size")+
  theme(legend.title = element_blank(),
        legend.position = "bottom")+
  ylim(0,0.3)
```

```{r}
summary(lmer(sem_icc ~ sample_size * overall_difference + (1|dv), rel_df_sample_size_summary))
```

Does between subjects variance change with sample size?

Yes. Between subjects variance decreases with sample size. This is more pronounced for non-contrast measures.

This goes against my intuitions. Looking at the change in between subjects percentage of individual measures' there seems to be a lot of inter-measure variance (more pronounced below for within subject variance). I'm not sure if there is something in common for the measures that show increasing between subjects variability with sample size and that separates them from those that show decreasing between subjects variability with sample size (the slight majority).

```{r}
tmp = rel_df_sample_size_summary %>%
  na.exclude()%>%
  group_by(overall_difference, sample_size, ddm_raw) %>%
  summarise(mean_var_subs_pct = mean(mean_var_subs_pct, na.rm=T))

rel_df_sample_size_summary %>%
  na.exclude() %>%
  ggplot(aes(factor(sample_size), mean_var_subs_pct))+
  geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
  geom_line(data = tmp, aes(factor(sample_size),mean_var_subs_pct, color=ddm_raw, group=ddm_raw))+
  geom_point(data = tmp, aes(factor(sample_size),mean_var_subs_pct, color=ddm_raw))+
  facet_wrap(~overall_difference)+
  ylab("Mean percentage of \n between subjects variance \n of 100 samples of size n")+
  xlab("Sample size")+
  theme(legend.title = element_blank(),
        legend.position = "bottom")
```

```{r}
summary(lmer(var_subs_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
```

Does within subjects variance change with sample size?

Yes. Within subject variance increses with sample size. This again goes against my intuition but here the inter-meausre differences are even more pronounced. There appears to be some measures for which the change in two measurements at different time points is larger the more subjects are tested and those that show a smaller decrease in within subject variance with larger sample sizes. I still don't know if these two types of measures have anything that distinguishes them.

```{r}
tmp = rel_df_sample_size_summary %>%
  na.exclude()%>%
  group_by(overall_difference, sample_size, ddm_raw) %>%
  summarise(mean_var_ind_pct = mean(mean_var_ind_pct, na.rm=T))

rel_df_sample_size_summary %>%
  na.exclude() %>%
  ggplot(aes(factor(sample_size), mean_var_ind_pct))+
  geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
  geom_line(data = tmp, aes(factor(sample_size),mean_var_ind_pct, color=ddm_raw, group=ddm_raw))+
  geom_point(data = tmp, aes(factor(sample_size),mean_var_ind_pct, color=ddm_raw))+
  facet_wrap(~overall_difference)+
  ylab("Mean percentage of \n within subjects variance \n of 100 samples of size n")+
  xlab("Sample size")+
  theme(legend.title = element_blank(),
        legend.position = "bottom")
```

```{r}
summary(lmer(var_ind_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
```

Does residual variance change with sample size?

```{r}
tmp = rel_df_sample_size_summary %>%
  na.exclude()%>%
  group_by(overall_difference, sample_size, ddm_raw) %>%
  summarise(mean_var_resid_pct = mean(mean_var_resid_pct, na.rm=T))

rel_df_sample_size_summary %>%
  na.exclude() %>%
  ggplot(aes(factor(sample_size), mean_var_resid_pct))+
  geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
  geom_line(data = tmp, aes(factor(sample_size),mean_var_resid_pct, color=ddm_raw, group=ddm_raw))+
  geom_point(data = tmp, aes(factor(sample_size),mean_var_resid_pct, color=ddm_raw))+
  facet_wrap(~overall_difference)+
  ylab("Mean percentage of residual variance \n of 100 samples of size n")+
  xlab("Sample size")+
  theme(legend.title = element_blank(),
        legend.position = "bottom")
```

```{r}
summary(lmer(var_resid_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
```

*Conclusion:* Larger samples are better for reliability but not necessarily always for the same reasons; for some variables this is due to increasing between subjects variance while for others it's due to decreasing residual variance (?).

```{r}
rm(rel_df_sample_size, rel_df_sample_size_summary)
```